import tensorflow as tf
import tensorflow_probability as tfp
= tfp.bijectors
tfb = tfp.distributions
tfd
= tf.constant(data['site_number'])
sites = tf.constant(data['ae_count_cumulative'])
observed_ae
= tf.unique_with_counts(sites)
unique_sites, sites_idx, sites_counts
= tfd.JointDistributionSequential([
mdl_ae #alpha
1, 1, name='alpha'),
tfd.Gamma(#beta
1, 10, name='beta'),
tfd.Gamma(#Poisson rates for all sites
lambda beta, alpha: tfd.Sample(tfd.Gamma(alpha, beta),
=unique_sites.shape,
sample_shape='rates'),
name#observed AEs
lambda rates: tfd.Independent(
=-1)),
tfd.Poisson(tf.gather(rates, sites_idx, axis=1,
reinterpreted_batch_ndims='observations')
name ])
This post presents a practical application of probabilistic programming and explores some technical details of a recent preprint where Timothé Ménard and I proposed a Bayesian model for the detection of adverse event underreporting in clinical trials.
A clinical trial is usually run across several investigator sites, which are required to report to the trial sponsor adverse events experienced by enrolled subjects. This is necessary to assess the safety of the intervention under investigation. Underreporting from certain sites has been a recurrent issue. Trial sponsors and health authorities rely on audits and inspections to ensure completeness of the collected safety data. This effort can and should be informed by statistical analysis of the adverse event reporting process with a focus on identifying sites that report at lower rates.
The main hurdle is that you are looking for missing data that you do not know is missing. We developed a solution around an observational model for adverse event reporting and a way to characterizing outlying investigator sites.
The relevant data is the count of adverse events reported by every patient enrolled in a study and their site assignment. To illustrate our methodology, we use data from the control arm of NCT00617669, an oncology study from Project Data Sphere.
Bayesian data analysis combines expert knowledge on the process of interest, expressed as a probabilistic model, with observed data to infer posterior distributions of the model parameters. These posterior distributions allow us to quantify uncertainties and risks, and to compute posterior expectation values of quantities of interest.
Hierarchical models provide a natural framework for subgroup analysis, where similar entities such as the sites of a single study can share information while maintaining a certain degree of independence. In our situation, at the bottom of the hierarchy, the count of adverse events reported by the \(n_i\) patients of site \(i\) can be modelled with a Poisson distribution, \(Y_i \sim \mathrm{Poi}(\lambda_i)\). The \(N_{\mathrm{sites}}\) Poisson rates \(\lambda_i\) can in turn be modelled as realizations of a random variable unique to the whole study with Gamma distribution \(\Gamma(\alpha, \beta)\). The parameters \(\alpha\) and \(\beta\) are unknown, so we assume a vague prior for both of them, \(\alpha \sim \mathrm{Exp}(1)\) and \(\beta \sim \mathrm{Exp}(10)\). The full joint distribution \[ P(\alpha, \beta, \lambda_i, Y_{i,j}) = P(\alpha)P(\beta)\prod_{i=1}^{N_{\mathrm{sites}}}P(\lambda_i\vert\alpha,\beta)\prod_{j=1}^{n_i} P(Y_{i,j}\vert \lambda_i) \] is summarized in the following graphical representation.
Numerical modelling of such joint distributions is made easy by probabilistic programming libraries such as TensorFlow-Probability (utilized here), Stan, PyMC3 or Pyro.
In Bayesian inference, the analytical derivation of the posterior distribution is often impossible, but the same probabilistic programming libraries provide efficient implementations of MCMC algorithms that return samples of the posterior distribution.
Code
= tf.dtypes.float32
dtype = 10
nchain =1000
burnin=10000
num_steps= mdl_ae.sample(nchain)
alpha0, beta0, rates0, _ = [alpha0, beta0, rates0]
init_state = [tf.cast(i, dtype=dtype) for i in [.1, .1, .1]]
step_size = lambda *init_state: mdl_ae.log_prob(
target_log_prob_fn list(init_state) + [tf.cast(observed_ae, dtype=dtype)])
# bijector to map contrained parameters to real
= [
unconstraining_bijectors
tfb.Exp(),
tfb.Exp(),
tfb.Exp()
]
@tf.function(autograph=False, experimental_compile=True)
def run_chain(init_state, step_size, target_log_prob_fn, unconstraining_bijectors,
=num_steps, burnin=burnin):
num_steps
def trace_fn(_, pkr):
return (
pkr.inner_results.inner_results.is_accepted
)
= tfp.mcmc.TransformedTransitionKernel(
kernel =tfp.mcmc.HamiltonianMonteCarlo(
inner_kernel
target_log_prob_fn,=3,
num_leapfrog_steps=step_size),
step_size=unconstraining_bijectors)
bijector
= tfp.mcmc.SimpleStepSizeAdaptation(
hmc =kernel,
inner_kernel=burnin
num_adaptation_steps
)
# Sampling from the chain.
= tfp.mcmc.sample_chain(
[alpha, beta, rates], is_accepted =num_steps,
num_results=burnin,
num_burnin_steps=init_state,
current_state=hmc,
kernel=trace_fn)
trace_fnreturn alpha, beta, rates, is_accepted
To assess convergence, we sample several chains that we can inspect visually (here with the ArviZ package) to make sure that they converge to the same distribution, mix well, and do not display pathological autocorrelations.
If we have to monitor several studies, we might want to automate that process. In that case, we can compute statistics of the sampled chains such as effective sample sizes or \(\hat{R}\) and implement automatic checks, for instance that \(\hat{R}\) is sufficiently close to 1.
mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_mean | ess_sd | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|---|---|
3001 | 3.193 | 1.246 | 1.081 | 5.565 | 0.022 | 0.016 | 3146.0 | 3146.0 | 2849.0 | 3764.0 | 1.0 |
3002 | 3.050 | 0.616 | 1.964 | 4.250 | 0.006 | 0.004 | 11787.0 | 11787.0 | 11502.0 | 17056.0 | 1.0 |
3003 | 6.043 | 1.695 | 3.084 | 9.280 | 0.022 | 0.015 | 6031.0 | 6031.0 | 5772.0 | 9765.0 | 1.0 |
3004 | 12.764 | 2.030 | 9.037 | 16.628 | 0.014 | 0.010 | 19819.0 | 19819.0 | 19489.0 | 28664.0 | 1.0 |
alpha | 1.774 | 0.223 | 1.364 | 2.192 | 0.003 | 0.002 | 6352.0 | 6352.0 | 6253.0 | 16128.0 | 1.0 |
beta | 0.119 | 0.017 | 0.087 | 0.151 | 0.000 | 0.000 | 7576.0 | 7576.0 | 7441.0 | 18742.0 | 1.0 |
From the samples \((\hat{\alpha}, \hat{\beta}, \hat{\lambda}_i)\) of the Markov chain, we can estimate the posterior risk of underreporting. One way to do it is to compute the left tail area of each \(\lambda_i\) (remember that the index \(i\) enumerates the sites) under the distribution \(\Gamma(\hat{\alpha}, \hat{\beta})\) and average it along the trace of the Markov chain. This corresponds to the probability that a Poisson rate drawn randomly from the study level distribution falls below the inferred Poisson rate of site \(i\), or, more explicitly, that a reference site from the same study would report less adverse events.
site | mean_ae_rate | std_ae_rate | rate_tail_area | observed_ae | |
---|---|---|---|---|---|
0 | 3001 | 3.20 | 1.25 | 0.09 | [4, 1] |
1 | 3002 | 3.05 | 0.62 | 0.08 | [2, 2, 1, 2, 5, 5, 5, 1] |
2 | 3003 | 6.04 | 1.69 | 0.22 | [7, 4] |
3 | 3004 | 12.76 | 2.03 | 0.52 | [3, 27, 8] |
4 | 3005 | 9.35 | 2.10 | 0.37 | [12, 6] |
5 | 3006 | 3.69 | 1.32 | 0.11 | [2, 4] |
6 | 3007 | 6.03 | 2.31 | 0.22 | [5] |
7 | 3008 | 17.95 | 1.72 | 0.69 | [11, 4, 16, 31, 23, 23] |
8 | 3009 | 6.52 | 1.76 | 0.24 | [6, 6] |
9 | 3010 | 14.71 | 0.92 | 0.59 | [21, 10, 6, 17, 10, 7, 26, 19, 18, 1, 18, 23, ... |
A lower value of this rate tail area thus indicates a higher risk of underreporting. This metric can be used by auditors and inspectors to prioritize their activities. Moreover, since it is the probability of a specified event, one can immediately compare the rate tail areas of sites from different studies. This is especially interesting for quality programs that oversee several trials.
This approach demonstrates the flexibility of Bayesian methods to build models that answer specific questions about a given process. In this example, the reporting rates of the different sites are the quantities of interest, but they are unobserved and have to be inferred from the available data with a mathematical model. The user-friendly API of modern probabilistic programming libraries combined with efficient inference algorithms have been making this type of workflow much easier than in the past and will certainly fuel a broader adoption in sectors that have not been traditionally driven by quantitative insights.